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Abstract 

Multi-dimensional data classification is an important and challenging 
problem in many astro-particle experiments. Neural networks have 
proved to be versatile and robust in multi-dimensional data classifica- 
tion. In this article we shall study the classification of gamma from 
the hadrons for the MAGIC Experiment. Two neural networks have 
been used for the classification task. One is Multi-Layer Perceptron 
based on supervised learning and other is Self-Organising Map (SOM), 
which is based on unsupervised learning technique. The results have 
been shown and the possible ways of combining these networks have 
been proposed to yield better and faster classification results. 

Keywords: Neural Networks, Multidimensional data classification, Self- 
Organising Maps, Multi-layer Perceptrons. 



1 Introduction 

Many high-energy gamma ray experiments have to deal with the problem 
of separating gammas from hadrons . The experiments usually generate 
large data sets with many attributes in them. This multi-dimensional data 
classification problem offers a daunting challenge of extracting small num- 
ber of interesting events (gammas) from an overwhelming sea of background 
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(hadrons) . Many techniques are in active research for addressing this prob- 
lem. The hst includes classical statistical techniques to more sophisticated 
techniques like neural networks, classification trees and kernel functions. 

The class of neural networks provides an automated technique for the 
classification of the data set into given number of classes j2j. It is in active 
research in both artificial intelligence and machine learning communities. 
Several neural network models have been developed to address the classifi- 
cation problem. Usually, one makes the distinction between supervised and 
unsupervised classifiers: A supervised classifier is used , when an analyst 
has some examples, for which the correct classification is known. This can 
be done, for example, in most problems related to particle physics at accel- 
erators, where there is a generally good knowledge of detectors and of the 
underlying physics, and good simulations are available. Whereas in an unsu- 
pervised technique, the events are partitioned into classes of similar elements, 
without using additional information. This is the case especially for fields 
operating in a discovery regime, as, e.g., astroparticle physics jH]. 

From a mathematical perspective, a neural network is simply a mapping 
from i?" — *■ K^, where is the input data set dimension and i?™ is the 
output dimension of the neural network . The network is typically divided 
into various layers; each layer has a set of neurons also called as nodes or 
information units, connected together by the links. The artificial neural 
networks are able to classify data by learning to discriminate patterns in 
features (or parameters) associated with the data. The neural network learns 
from the data set when each data vector from the input set is subjected to 
it. The learning or information gain is stored in the links associated with the 
neurons. 

The output generated by the network depends on both the problem and 
network type. For the gamma/hadron separation problem the supervised 
network maps each input vector onto the [0,1] interval, whereas in unsuper- 
vised networks the nodes are adapted to the input vector in such a way that 
the output of the network represents the natural groups that exist in the 
data set. A visualization technique is used to view the groups discovered by 
the network. 

Section 2 describes the data sets used for the classification. Section 3 deals 
with the multilayer perceptron network and its classification results. Section 
4 deals with Self-Organizing maps and its variant along with their classifica- 
tion results. Conclusions and future perspectives have been discussed in the 
section 5. 
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2 Data set description 

The data sets are generated by a montecarlo program, CORSIKA (4^. They 
contain 12332 gammas, 7356 'on' events (mixture of gammas and hadrons), 
and 6688 hadron events. These events are stored in different files. The files 
contain event parameters in ASCII format, each line of 12 numbers being 
one event 15], with the parameters defined below, 

1. fLength: major axis of ellipse [mm] 

2. fWidth: minor axis of ellipse [mm] 

3. fSize: 10-log of sum of content of all pixels 

4. fConc: ratio of sum of two highest pixels over fSize [ratio] 

5. fConcl: ratio of highest pixel over fSize [ratio] 

6. fAsym: distance from highest pixel to centre, projected onto major axis 
[mm] 

7. fM3Long 3rd root of third moment along major axis [mm] 

8. fM3Trans 3rd root of third moment along minor axis [mm] 

9. f Alpha: angle of major axis with vector to origin [deg] 

10. fDist: distance from origin to centre of ellipse [mm] 

11. fEner: 10-log of MC energy [in GeV] 

12. fTheta: MC zenith angle [rad] 

The first 10 image parameters are derived from pixel analysis, and are used 
for classification. 
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3 Multi-Layer Perceptron 



For this approach we used the ROOT Analysis Package (v. 4.00/02) and in 
particular the MultiLayer Perceptron class [H], which implements a generic 
layered network. Since this is a supervised network we took half of Gamma 
and OFF data to train the network and the remaining data to test it. The 
code of the ROOT package is very flexible and simple to use. It allowed us to 
create a network with a 10 nodes input layer, a hidden layer with the same 
number of nodes and an output layer with just a single neuron which should 
return "0" if the data represent hadrons or "1" if they're gammas. Weights 
are put randomly at the beginning of the training session and then adjusted 
from the following runs in order to minimize errors (back-propagation). Er- 
rors at cycle i are defined as: err^ = | where is the error of the output 
node. Data to input and output nodes are transferred linearly, while for 
hidden layers they use a sigmoid (usually: a{x) = 1/(1 + exp(— x))). 

We have tested the same network using different learning methods pro- 
posed by the code authors, as for example the so called "Stochastic mini- 
mization", based on the Robbins- Monro stochastic approximation, but the 
default "Broyden, Fletcher, Goldfarb, Shanno" method has proved to be the 
quickest and with the better error approximation. 
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(a) The error functions for training and test data 
took on 1000 runs. 



(b) The histogram of distributions for gamma and 
hadron parameters. 



Figure 1: MLP classification results 
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Figures l.a and l.b represent a possible output when using the ROOT 
package on those data. The first one depicts the error function for each run 
of the network, comparing the training and the test data. Note that the 
greater is the number of runs, the better the network behaves. The second 
one shows the distributions of output nodes, that is how many times the 
network decides to give a value near to "0" or to "1". 



SOM is based on unsupervised learning technique. It is used in the clas- 
sification of data sets with no labels. It consists of a map of information 
units also called as neurons, arranged in a two-dimensional grid [7]. Ev- 
ery neuron i of the map is associated with a ra-dimensional reference vector 
rrii = [rriii, . . . ,min] , where n denotes the dimension of the input vectors. 
The neurons of the map are connected to adjacent neurons by a neighbour- 
hood relation, which dictates the topology, or the structure, of the map. The 
most common topologies in use are rectangular and hexagonal. The learning 
process of the SOM is as follows: 

1. Initialisation phase: Initialise all the neurons in the map with the 
input vectors randomly. 

2. Data normalization: For a better identification of the groups the 
data have to be normalized. We employed the 'range' method where 
each component of the data vector is normalized to lie in the intravel 



3. SOM Training: Select an input vector x from the data set randomly. 
A best matching unit (BMU) for this input vector, is found in the map 
by the following metric 



where rrii is the reference vector associated with the unit i. 

4. Updating Step: The reference vectors of BMU and its neighbourhood 
are updated according to the following rule 



4 Self-Organising Maps (SOM) 



[0,1]. 



X — rricW = min 
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where 

hci{t) is the kernel neighbourhood around the winner unit c. 
t is the time constant. 

x{t) is an input vector randomly drawn from the input data set at 
time t. 

a{t) is the learning rate at time t. 

Nc{t) is the neighbourhood set for the winner unit c. 

The above equation make BMU and its neighbourhood move closer to 
the input vector. This adaptation to input vector forms the basis for 
the group formation in the map. 

5. Data groups visualisation: steps 3 and 4 are repeated for selected 
number of trials or epochs. After the trails are completed the map 
unfolds itself to the distribution of the data set finding the number of 
natural groups exist in the data set. The output of the SOM is the set 
of reference vectors associated with the map units. This set is termed 
as a codebook. To view the groups and the outliers discovered by the 
SOM we have to visualize the codebook. U-Matrix is the technique 
typically used for this purpose. 

The ON events data set has directly used with the SOM. No prior training is 
required. The unsupervised behavior of SOM had discovered the groups in 
the data set in an automatic way. We worked with two kernel neighbourhoods 
of the SOM that are described below. 

a) Gaussian SOM 

The kernel neighbourhood is defined by gaussian function 

here d^^ is the distance between the winner unit c 
and the unit i, at is the neighbourhood radius. The results of the 
classification are shown in the figure 2. a. The Map is a 25X25 network 
and is trained with 300 epochs. Further increase in map size and epochs 
does not shown any improved results. 

b) Cutgaussian SOM 

The kernel neighbourhood is defined by cut-gaussian function 
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hci{t) = e'^'^"^^'^* ) -1 ((Tt — dci) . The results of the classification are shown 
in the figure 2.b. The Map is 40X30 network and is trained with 300 
epochs. Further increase in map size and epochs does not shown im- 
proved results. The cutgaussian kernel shown better performance than 
that of of gaussian kernel. 




(a) SOM with gaussian kernel: Two groups are (b) SOM with cutgaussian kerneh Again the al- 

discovered that are separated by outhers and gorithm found 2 groups and the outhers are well 

boundaries. separated this time. 

Figure 2: SOM classification results 

We developed a C++ implementation of SOM with both kernel neighbour- 
hoods. The SOM trained results are visualized using the u-matrix technique 
implemented in SOM TOOLBOX 2.0 in MATLAB environment |H|. 

5 Conclusions and Future Work 

In this article we classified the monte-carlo gamma ray data of the MAGIC 
experiment, using MLP and SOM. Both the networks shown good classifica- 
tion results. 

The advantage of the SOM algorithm is that it needs no training vectors to 
find the groups in the data set i. e. it clusters the data set in an automatic way, 
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but the disadvantage of this technique is that it cannot label the data groups 
found. At the other hand MLP based on supervised technique identifies the 
group labels, but the training session could be longer. 

The proposal for the future work will be combining MLP and SOM tech- 
niques. The combination of both techniques could yield better results. First 
train the data set with SOM, which yields in a clustered data set then use 
this data set to train the MLP to label the groups. This will significantly 
decrease the training period for MLP and thus makes the network to perform 
faster. 
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